Multi Factor Stock Analysis Using Multilayer Perceptron

Background

Factor investing is a security selection approach that involves targeting specific elements or factors that appear to drive the returns of an asset. The premise behind Factor in fasting is that using quantitative methods various risk premia can be identified isolated and utilized to the investors advantage. This idea is the corner stone behind almost all quantitative portfolio management methods. The impact factor investing his head on the financial Industry is profound, so much so, that we have seen The worlds largest asset manager (BlackRock) shift massive amounts of assets away from “ Old school fundamental managers” to purely quantitative strategies, and mandate that whatever fundamental managers remained use more data driven processes. Factor investing is not a new idea, but recent developments in computing and mathematics (Machine learning) allow practitioners to approach factor investing from an entirely different angle.

Perspective:

The model makes the following underlying assumptions:

1) Predicting future returns is often considered a fool’s errand by professionals. The reason being, is that there are too many factors that can impact the price of a given security at any given moment to account for accurately. Furthermore, the impact and importance of these factors are constantly changing. Despite these challenges, most practitioners contitnue to try. 2) Selecting the stocks with the highest probability of a desired outcome based on common factors will lead to a more beneficial outcome than selecting stocks based on an expected (predicted) return assuming accurate predictions are not sustainable over longer periods of time. 3) It is at least as important to avoid bad stocks as it is to invest in good ones. Selecting good stocks helps generate returns, avoiding bad stocks helps minimize losses.

The afore mention makes clear that there could be significant benefits to framing the question of security selection as a classification problem as opposed to a regression problem. To solve such a problem we propose the use of a Multilayer perceptron to classify stocks into one of two categories (bianry classification): 1) Above or equal to the median return of all stocks over the next year, or 2) below the median return over the next 1 year.

Why MLP?

MLP vs Other Classification models

Neural networks offer some advantages for this task over other supervised classification algorithms. First and foremost is the fact that MLPs have the ability to deal with non linear relationships. Second, where traditional factor investing processes require that we identify and manipulate features on our own, Neural networks let us attack the problem directly using data. Last neural networks are generally better at dealing with noisy data. Noisy data is a very common issue in finance.

MLP vs Other Neueal Network Types - considerations

Panel Data

The data in question is panel data. Panal data is in its on class of its own. Simply put, panel data is not time series data, and its also not cross- sectional data. It is in fact both. The Multidemensional nature of panel data adds a clear layer of complexity in deciding what model to use. On one hand, the time series aspect of the data suggests RNN or LSTM models would be better suited to deal with the data in question. On the other hand the tabular nature of the data in fact suggests that the MLP may be better suited for the task. In the end given the fact that our objective was to classify and not predict price in the future the time element seemed secondary and we made the choice to treat the data as cross-sectional data and ignore/minimize the effect of time in the data this made the multilayer perceptron the best choice for the task as framed

Data

The model uses data from four different categories:

1) Accounting data – this is data taken directly from a company’s financial statments (Income statement, Balance sheet, Cashflow statement). 2) Trading data – This includes data that is based on market activity over a given period of time. 3) Valuation data – This is generally market data normalized by accounting data 4) Technical Indicators – This includes moving averages for various windows based on the price of the stock.

In all the model uses 69 features for 504 different stocks over about 33 months.

The structure of the data my raise questions about the decision to use a standard feed forward network as apposed to an RNN or an LSTM. This issue was in fact taken into consideration and the decision to use a feed forward network as opposed to one of the other two was based on the following: 1) The data structure is not exactly sequential, The data used in the model is in fact panel data which has both timeseries and cross-sectional elements 2) The literature and academic justification for using Neural networks on panel data is sparse. 3) Given the fact that the model was structured as a classification problem, the time element is less meaningful and as such the data can be view as purely tabular

Stage 1 importing data and feature engineering

The datasets all come from three sources:

1) Refinitive Eikon (API) – this is a subscription based financial data provider that provides access to thousands of corporate and financial data sets for companies and markets around the globe. 2) Yahoo finance (API) – within the scope of this model the yahoo finance API is used strictly for stock price history, and for the sake of convenience. 3) Datastream Webservices – this is subscription based financial and economic data provider that also provides access to data sets for companies, countries, and markets around the world

To limit the need for multiple calls and to minimize the use of local storage the raw data required from afore mentioned sources was stored in an SQL database and is updated regularly from a machine with access to an Eikon Terminal (required for access to two of the three sources) To view the code for the calls to I have provided a notebook titled data_collection_five_factor. The functions below will import all the required data from the database THIS CAN TAKE UP TO ONE HOUR TO COMPLETE do not run if unnecessary

Stage 2 Data Eploration

The following data evaluation process has three purposes.

  1. To identify any existing relationships among features or between features and The label.
  2. To understand the distribution of the data sets
  3. To identify outliers and establish a need for any kind of transformation or standardization of the data.

Data Distribution Profile

The histogram charts below allow us to approximate the distribution of the various features fit into the model. Due to the asymmetric nature of the many of the features, and the difficulty of identifying observations in the tails of the distribution via standard distribution plots, The data below was transformed to a logarithmic scale. After the date it was is transformed it is clear that the data is fraught with outliers.

Correlations

Below is a basic correlation matrix. The correlation matrix makes very clearly that for the most part correlations are weak (Close to zero) with pockets of stronger positive correlations amongst certain features (accounting features in particular). We can deduce the following from the correlation matrix:

  1. Thre is no feature with a strong linear relationship (positive or negative ) to 1 year forward returns.
  2. If a relationship between the features and forward returns does in fact exist, traditional methods like OLS regression will not provide us with a strong model
  3. There is likely some multicoliniarity amongst features that will have to be delt with either by feature elimination or L2 regularization

Box plots

We are not only intrested in the correlation between the features and 1 year forward returns but also in the distribution of the such correlations. This allows us to better understand the impact any outliers may be having ont he data. The box plots below illustrate this distribution. The correlations are computed on a date by date basis for the entire cross-section of stocks. The plot shows that for the most part correlations revolve around zero, but can go as high as 0.7 at the higher end of 1 year vol and as low as -0.47 for three month momentum. The key takeaway from this segment is that not only are correlations between features and One year forward returns low as demonstrated previously from the correlation matrix, but the dispersion within the correlation data of every feature to one-year returns is significant in many cases.

Effectively The data has forced us to make one of two conclusions:

  1. These data sets don’t matter I.e. there is no significant relationship between any of the features and one year forward returns.

OR

  1. The relationship between the features and One year forward returns is not linear and or noisy and therefore it’s very difficult to pick up using correlations. To deal with this issue we turn to domain knowledge, and based on said knowledge it is difficult to accept the idea that a collection of many of the most commonly used features for picking stocks would have nothing to do with stock returns. As such option number two appears more likely.

Data Prep

Outlier detection and Elimination

To eliminate outliers we chose to use the Isolation forest anomoly detection algorithm. The general idea behind the isolation forest algorithm is that when dealing with large amounts of data, it is easier to identify anomalies than it is to identify normal points. The algorithm effectivly works like a decision tree that randomly partitions data and seeks outliers based on a random split between max and min of the data. The logic here is that a random partition should usually occure closer to the root of the tree because outliers usually occur further away.

Label Creation

As previously discussed, the model's objective is to classify stocks int top 50% gainers and bottom 50% gainers. We decided on median because in addition to being a widely used measure of central tendancy, the use of a median ensures a fairly balanced training set regardless of the data's distribution. This will ensure that the model evaluates stocks on a relative basis. We believe this to be advantageous despite the potential missclassifications.

Train Test Split and Feature Scaling

Based on the isolation forest model used to detect outliers in our data, we have discarded about 22.5% of the original observations due to extreme observations. The remaining data must now be scaled and split into training and test sets. Due to the balance nature of the data, and the relatively small number of observations, we decided to split the data into 80% train and 20% test.

Model Selection

Tuning a MLP

We’ve used The Karas package to construct our model. We chose to build a sequential model with three hidden layers with multiple nodes, all of which use the 'relu' activation function, and an the output layer with a single node that uses the sigmoid activation function. We use the binary cross entropy loss function and the adam optimizer. We also include dropouts after every hidden layer to help minimize potential overfitting, and use l2 regularization to help with any effects we may experience as a result of multicoliniarity. Lastly we use the Keras RandomSearch tuner to identify the ideal number of nodes in each hidden layer, and the most effective learning rate for the optimizer. The tuner objective is set to minimize the loss fucntion results for the validation (test) set.

Tuning results and Evaluation of Selected Model

Once the tuner has identified the best model we retrain the model using the identified parameters in order to evaluate the model's performance over all itterations, based on the following metrics:

  1. loss - (prediction error of the model)
  2. binary accuracy - (how often the predicted label equals the true label)
  3. Precission - (how many true positives did the model select out of the total amount of times it returned a positive result)
  4. Recall - (how many true positives did the model select out of the total number of true positives in the data)

Evaluation of Results

Both the training and testing loss results decline steadily over the first 125 epochs. At this point the validations set begins to level off, and the model can no longer improve on the loss. By epoch 150 the model appears to be slightly overfitting (0.61 train vs 0.68 val) but this doesnt seem to be extreme or impact the other metrics at the moment. Binary accuracy steadily climbs for both the training and validation sets through out the model, though the gradual climb begins to tapper off around 0.62. While this may feel like a low number, context is important. If an algorithm that can succesfully pick the top 50% of stocks with 90% accuracy existed, everyone would adopt it, and as people continued to adopt it, its accuracy and predictive power would decline. As such, the 62 - 65 accuracy range appears reasonable. The precision data is an indication that our model is doing a reasonable job avoiding bad stocks. The weakest number is in fact the recall, which shows that the model is having difficulty isolating good stocks, i.e higher false negatives.

Implementation

We are now ready to put our model to use in a simulated environment. To do this, we first generate a pipeline that takes in raw data, creates features that are free of outliers and properly scaled, as well as a label based on the expected one year return for each stock at each point in time, this data is then used to train our model, which in turn makes a classification for all stocks based on the next days data. Simply put, we are using 3 months of data to train our model, and making one prediction based of that data. That data is then discarded, and a fresh set of data is used to train the next itteration

A word on regime change

Regime change in the context of the stock market is when "something changes" as the market no longer behaves the way it had previously. It is precisly the issue of regime change that makes stock selection particularly difficult to do succesfully over long periods of time. Regime change is always obvious in hindsite, and almost never visible in advance. Clearly, any model that searches for patterns in data can have a particularly difficult time classifying properly once stocks begin to behave differently than they had previously. To deal with this, our model is constructed to discard old data once its used, and never re-introduce it into the models. This is also the reason this model is structured to "trade" quarterly.

Trading Engine

The trading engine is one of the more difficult parts of this project. It requires that we simulate both a market, and a financial services firm at the same time. This is done by taking predetrmined transactions for the list generated above, executing them using historical prices, and updating performance between dates. The process is also designed to ensure that enough funds exist for transactions, that additional lots are properly accounted for, that performance is continuosly calculated, and positions are known at all times, and no short selling is allowd. Though there are similarities accross models, each model has its own trading engine based on the guidlines established for the model.

Financial Evaluation

Portfolios are usually evaluate it in both risk adjusted and absolute terms. Evaluating a portfolio in absolute terms is often times as simple as looking at performance overtime and comparing it to a peer or benchmark. Whereas evaluating a portfolio in risk-adjusted terms usually requires that the analyst “Level the playing field” somehow (math is usually involved).

Perfomance Chart

In absolute terms the portfolio performance is OK for the most part. We do see periods of underperformance initially but the model did a good job protecting the downside during the start of the COVID pandemic. In fact at the market's trough, on March 23 of 2020, the market was down -24% from the start of the evaluation, while the model down only -15%. Since that time the model has for the most part outperformed in absolute terms. The model did have some difficulty with regime change as the market went from growth to value in late Jan of 21 but recoverd nicely after the next trading date.

Risk Adjusted Metrics

We use 5 major risk adjusted metrics to evaluate our portfolio

  1. The Sharpe Ratio - This is calculated as the average return less the risk free rate (currently 0), normalized by the standard deviation of portfolio returns (a proxy for risk).
  2. The Sortino Ratio - Similar to the Sharpe ratio but uses a minimum accepted return (set to 0), and is normalized by downside deviation instead of standard deviation (a proxy for Bad risk).
  3. Treynor - Similar to Sharpe as well but is normalized by the OLS regression coefficient of the portfolios returns on the market (AKA Beta).
  4. Max Drawdown - a measure of risk that identifies the portfolio's worst peak to trough performance over the time frame
  5. Calmar Ratio - The portfolios avreage return normalized by its Max drawdown (absolute value of)

The risk adjusted metrics paint a much brighter picture for the model as ALL show that the model outperformes the market nicely.

CAPM

The capital asset pricing model or CAPM is one of the most widley used models in finance. It is used to evaluate risk, calculate expected returns, and measure portfolio outperformace. It is interpreted as follows:

  1. An asset's/portfolio's expected returns is a linear function of its sensitivity to market variance.
  2. The asset's expected return is calculates as: asset expected return = risk free rate + beta x (market return - risk free rate)
  3. Any returns in excess of the expected return are considred outperformance or underperformance, this is known as Alpha.
  4. Positive Alpha is the goal of every portfolio manager on earth.